A Comparative Study of Hard and Fuzzy Data Clustering Algorithms with Cluster Validity Indices
نویسندگان
چکیده
Data clustering is one of the important data mining methods. It is a process of finding classes of a data set with most similarity in the same class and most dissimilarity between different classes. The well known hard clustering algorithm (K -means) and Fuzzy clustering algorithm (FCM) are mostly based on Euclidean distance measure. In this paper, a comparative study of these algorithms with different distance measures such as Chebyshev and Chi-square is proposed. The new algorithms are tested on the four well known data sets such as Contraceptive Method Choice (CMC), Diabetes, Liver Disorders and Statlog (Heart) from the UCI repository. Experimental results show that FCM based on Chi-square distance measure gives better result than Chebyshev distance measure. We also propose the FCM algorithm based on σ -distance measure. The FCM algorithm is also tested with cluster validity indices such as partition coefficient and partition entropy. The results show that Chebyshev distance measure is reported maximum partition coefficient and minimum partition entropy than the other distance measures. This paper also provides a brief review of applications of K -means and Fuzzy c-means algorithms.
منابع مشابه
A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کاملClustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers
In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...
متن کاملEXCLUVIS: A MATLAB GUI Software for Comparative Study of Clustering and Visualization of Gene Expression Data
The result of one clustering algorithm varies from that of another for the same input dataset as the input parameters of an algorithms can substantially affect the behavior and execution of the algorithms. Cluster validity measures can be used to find the partitioning that best fits the underlying data. In most realistic applications, this analysis can be visualized using simple Computer-Aided-...
متن کاملA New Validity Measure for Heuristic Possibilistic Clustering
A heuristic approach to possibilistic clustering is the effective tool for the data analysis. The approach is based on the concept of allotment among fuzzy clusters. To establish the number of clusters in a data set, a validity measure is proposed in this paper. An illustrative example of application of the proposed validity measure to the Anderson’s Iris data is given. A comparison of the vali...
متن کاملFuzzy Cluster Quality Index using Decision Theory
Abstract Clustering can be defined as the process of grouping physical or abstract objects into classes of similar objects. It’s an unsupervised learning problem of organizing unlabeled objects into natural groups in such a way objects in the same group is more similar than objects in the different groups. Conventional clustering algorithms cannot handle uncertainty that exists in the real life...
متن کامل